Using a Random Forest proximity measure for variable importance stratification in genotypic data

نویسندگان

  • José Antonio Seoane Fernández
  • Ian N. M. Day
  • Colin Campbell
  • Juan P. Casas
  • Tom R. Gaunt
چکیده

In this work we study variable-significance in classification using the Random Forest proximity matrix and local Importance matrix. We use the proximity m atrix t o g roup t he s amples acr oss a number of c lusters a nd use t hese clusters to s tratify th e importance of a variable. We apply t his a pproach t o a cardiovascular g enotype d ataset f or sample classification b ased o n coronary heart disease and we found a number of variations related with cardiovascular disease phenotypes. We also used a set of phenotypes related with this genotype data to match the obtained clusters with coronary heart diseases phenotypes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene Selection Using Random Forest and Proximity Differences Criterion on DNA Microarray Data

Selection of relevant genes for sample classification is a common task in most gene expression studies. As a powerful classification approach, random forest has been applied in this field, and it shows excellent performance compared with other classification methods. The measure of variable importance is the key of gene selection using random forest. However, the existing methods just consider ...

متن کامل

Random Forest Visualization

Classification is the process of assigning a class label to an observation based on its proprieties or attributes. A classification algorithm is applied to a data set, producing a model. By studying the model, insights about the data set structure can be gained. The benefits that a model can bring depend on the model. In this work, a Random Forest model is used for the analysis of data. A Rando...

متن کامل

Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures

A recent study examined the stability of rankings from random forests using two variable importance measures (mean decrease accuracy (MDA) and mean decrease Gini (MDG)) and concluded that rankings based on the MDG were more robust than MDA. However, studies examining data-specific characteristics on ranking stability have been few. Rankings based on the MDG measure showed sensitivity to within-...

متن کامل

A Random Forest proximity matrix as a new measure for gene annotation

In this paper we present a new score for gene annotation. This new score is based on the proximity matrix obtained from a trained Random Forest (RF) model. As an example application, we built this model using the association pvalues of genotype with blood phenotype as input and the association of genotype data with coronary heart disease as output. This new score has been validated by comparing...

متن کامل

Classification of large datasets using Random Forest Algorithm in various applications: Survey

Random Forest is an ensemble of classification algorithm widely used in much application especially with larger datasets because of its outstanding features like Variable Importance measure, OOB error detection, Proximity among the feature and handling of imbalanceddatasets. This paper discusses many applications which use Random Forest to classify the dataset like Network intrusion detection, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014